MARVEL: an integrated alternative splicing analysis platform for single 您所在的位置:网站首页 scrna seq analysis MARVEL: an integrated alternative splicing analysis platform for single

MARVEL: an integrated alternative splicing analysis platform for single

2024-07-03 14:14| 来源: 网络整理| 查看: 265

Abstract

Alternative splicing is an important source of heterogeneity underlying gene expression between individual cells but remains an understudied area due to the paucity of computational tools to analyze splicing dynamics at single-cell resolution. Here, we present MARVEL, a comprehensive R package for single-cell splicing analysis applicable to RNA sequencing generated from the plate- and droplet-based methods. We performed extensive benchmarking of MARVEL against available tools and demonstrated its utility by analyzing multiple publicly available datasets in diverse cell types, including in disease. MARVEL enables systematic and integrated splicing and gene expression analysis of single cells to characterize the splicing landscape and reveal biological insights.

INTRODUCTION

Single-cell RNA sequencing (scRNA-seq) is a powerful tool for studying transcriptional heterogeneity in normal tissues (1–5) and pathological conditions (6–11). The vast majority of scRNA-seq analyses focus on gene-level expression, however, alternative splicing represents an important additional layer of transcriptional complexity underlying gene expression (12). Alternative splicing has not been widely investigated at single-cell resolution and thus remains an untapped source of knowledge in both health and disease states. This is potentially due to the lack of available computational tools to address the challenges of alternative splicing analysis at single-cell resolution, such as high dropout rates, large cell numbers, and PCR amplification biases that may distort isoform expression (13,14). Although existing analysis pipelines such as Seurat (15), Monocle (16) and Scanpy (17) enabled integrative analysis workflows for single-cell gene expression, they do not support comprehensive analyses to combine gene-level and alternative splicing information.

Recently, analysis tools, such as BRIE (versions 1 and 2) (18,19), Expedition (20), SCATS, (21), DESJ-detection (22) and VALERIE (23) were developed to analyze alternative splicing in scRNA-seq datasets generated from the plate-based platforms, e.g. Smart-seq2 (24) or microfluidic-based platforms, e.g. Fluidigm C1 instrument. BRIE uses a Bayesian approach to learn informative sequence features for percent spliced-in (PSI) estimation, leading to the improvement of PSI estimation for splicing events that have low-to-no coverage in scRNA-seq data (18,19). Expedition introduces the concept of ‘modalities’ to stratify PSI distributions into discrete categories (20). SCATS aggregates spliced reads from a group of exons generated from the same isoform(s), allowing the detection of splicing events with low sequencing depth, and it also supports analysis of scRNA-seq data with or without unique molecular identifiers (UMIs) (21). DESJ-detection performs splicing analysis at the splice junction level to detect differential splicing between groups of cells (22). Lastly, VALERIE enables visual-based validation of candidate splicing events across groups of a large number of single cells to identify true positive events for downstream studies (23).

However, a number of functionalities required to comprehensively characterize alternative splicing dynamics at the single-cell level are not yet available. For instance, current analysis tools focus on PSI quantification for skipped-exons (SE) and mutually exclusive exons (MXE) splicing events (18,20,21) but did not include retained-introns (RI), alternative 5′ and 3′ splice sites (A5SS and A3SS), and alternative first and last exons (AFE and ALE). While SE are the major splicing event type (25), other types of splicing events are also important sources of gene expression heterogeneity and have been shown to contribute to the cellular phenotype. For example, RI are a source of neoantigens in melanoma (26), whereas A5SS, A3SS, AFE and ALE are often dysregulated in myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML) patients carrying mutations in genes encoding for splicing factors (25,27,28).

Modality classification enables the changes in splicing patterns across different cell populations (20). Biases from PCR amplification and library preparation prevalent in scRNA-seq have been shown to lead to a high proportion of false positives, in particular for the bimodal classification (14). Therefore, modality assignment should incorporate these technical biases to enable better classification of splicing patterns.

Taken together, current computational tools may not comprehensively facilitate the characterization of alternative splicing dynamics at single-cell resolution. Moreover, existing analysis workflows do not integrate gene expression and alternative splicing information into a single framework. Here, we introduce MARVEL, an R package for integrative single-cell alternative splicing and gene expression analysis. We benchmarked MARVEL against existing computational tools for single-cell alternative splicing analysis and demonstrated its utility by analyzing publicly available datasets generated from the plate- and droplet-based library preparation methods derived from induced pluripotent stem cells (iPSCs) differentiated into endoderm and cardiomyocytes, respectively (29,30).

MATERIALS AND METHODS Plate-based scRNA-seq datasets Processing of publicly available datasets

To assess and validate the performance of MARVEL on scRNA-seq data generated from plate-based library preparation protocols, we retrieved five datasets from previous studies (16,20,29,31,32). Raw sequencing reads (FASTQ) were downloaded from the Sequence Reads Archive (SRA). Adapters and 3′ bases with Phred quality scores 100 000 mapped reads, >70% alignment rate and 5 000 000 mapped reads, >90% alignment rate and 100 000 mapped reads, >75% alignment rate and 100 000 mapped reads, >75% alignment rate and 50%, >40 000 mapped reads, and mitochondrial reads 



【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有